Development of immune‐related cell‐based machine learning for disease progression and prognosis of alcoholic liver disease

Dear Editor, Immune-related cell (IRCs)-based machine learning (ML) models, including random forest (RF), multilayer perceptron (MLP), generalised linear model (GLM) and gradient boosting machine (GBM), have shown great performance in the estimation of alcoholic liver disease (ALD). Pathological biopsy of the liver is considered the most reliable method for determining the diagnosis and evaluating the staging and prognosis of liver disease, but liver biopsy can cause complications and therefore lacks specific noninvasive diagnostic biomarkers.1 IRCs are biomarkers of systemic inflammation, and identifying changes in IRCs associated with ALD will contribute to improving the diagnosis of ALD.2,3 However, current biomarker discoveries are usually focused on an individual biomolecule, resulting in low clinical applicability,4,5 and few studies have investigated diagnostic value of IRCs in ALD. Therefore, there is urgent ongoing research exploring an accurate and sensitive noninvasive test that is low cost and low risk. Recently, ML has been widely used in biomedical research and disease diagnosis, and ML is a promising effective method in the identification of hepatic fibrosis and cirrhosis.3,4 However, no studies have investigated the role of ML model, especially IRC-based ML model, in the diagnosis, disease progression and prognosis of ALD. In this study, we developed IRC-based ML models to assess ALD progression and prognosis. Overall, 207 ALD patients (including alcoholic fatty liver (AFL) and alcoholic cirrhosis (ALC)) and 234 healthy controls (HCs) were included (Figure 1A and Table 1). ML had great performance in the estimation ofALD. To explore the role of IRCs in disease onset, we compared the differences between ALD and HCs. Gender, age, MCV, platelets, RBCs, neutrophil/lymphocyte ratio (NLR) and monocyte/neutrophil ratio (MNR)were included in the nomogramusing LASSO


Development of immune-related cell-based machine learning for disease progression and prognosis of alcoholic liver disease
Dear Editor, Immune-related cell (IRCs)-based machine learning (ML) models, including random forest (RF), multilayer perceptron (MLP), generalised linear model (GLM) and gradient boosting machine (GBM), have shown great performance in the estimation of alcoholic liver disease (ALD).
Pathological biopsy of the liver is considered the most reliable method for determining the diagnosis and evaluating the staging and prognosis of liver disease, but liver biopsy can cause complications and therefore lacks specific noninvasive diagnostic biomarkers. 1 IRCs are biomarkers of systemic inflammation, and identifying changes in IRCs associated with ALD will contribute to improving the diagnosis of ALD. 2,3 However, current biomarker discoveries are usually focused on an individual biomolecule, resulting in low clinical applicability, 4,5 and few studies have investigated diagnostic value of IRCs in ALD. Therefore, there is urgent ongoing research exploring an accurate and sensitive noninvasive test that is low cost and low risk. Recently, ML has been widely used in biomedical research and disease diagnosis, and ML is a promising effective method in the identification of hepatic fibrosis and cirrhosis. 3,4 However, no studies have investigated the role of ML model, especially IRC-based ML model, in the diagnosis, disease progression and prognosis of ALD.
In this study, we developed IRC-based ML models to assess ALD progression and prognosis. Overall, 207 ALD patients (including alcoholic fatty liver (AFL) and alcoholic cirrhosis (ALC)) and 234 healthy controls (HCs) were included ( Figure 1A and Table 1). ML had great performance in the estimation of ALD. To explore the role of IRCs in disease onset, we compared the differences between ALD and HCs. Gender, age, MCV, platelets, RBCs, neutrophil/lymphocyte ratio (NLR) and monocyte/neutrophil ratio (MNR) were included in the nomogram using LASSO regression ( Figure 1B-D). Decision curve analysis (DCA) curve found that the net benefit of the nomogram was 0.03 to 0.99 ( Figure 1C), suggesting that the predictive ability and accuracy of model fitting are high. Here, we introduced the findings of the RF model, as it had the greatest performance ( Figure 1E-H and Table S1). AUCs of PR and ROC were 0.9982 and 0.9975 in the training set, and 0.9984 and 0.9978 in the testing set, respectively ( Figure 1E and F and Table 1). In addition, the evaluation of ML (i.e., root mean square error (RMSE) and mean square error (MSE)) is shown in Table S2, indicating that the ML models are robust and reliable. These results suggested that ML models, especially the RF model, not only have high precision but also have high predictability. We conducted additional ML models for ALD using variables not screened by least absolute shrinkage and selection operator (LASSO) regression as sensitivity analyses ( Figure 1I-L). Results were consistent with regression screening for most outcomes, which means that our results are robust and that we can obtain good predictions by using several important variables instead of all variables.
To further investigate the role of IRCs in the disease progression of ALD, we conducted subgroup analyses in AFL and ALC patients. First, 13 variables may be potential risk factors for AFL patients, and 8 variables including gender, basophils, MCV, lymphocytes, neutrophils, platelets, NLR and MNR were included. The RF model had the greatest performance, and AUCs of PR and ROC were 0.9984 and 0.9990 in the training set, and 1.0000 and 1.0000 in the testing set, respectively (Figure 2A-G and Table 1). Second, 17 variables may be related to ALC patients, and 10 variables were eventually left in the LASSO regression and nomogram ( Figure 2H-N), that is, age, gender, MCV, lymphocyte, platelet, RBC, platelet/lymphocyte ratio (PLR), platelet/neutrophil ratio (PNR), MNR and NLR. The RF model had the greatest performance ( Figure 2M and N and Table 1). Finally, 13 variables may be related   to disease progression when comparing ALC with AFL, and 11 variables were eventually left in the LASSO regression and nomogram. The RF model had the greatest performance (Table 1 and Figure S1). To further explore the role of IRCs in the prognosis of ALD, MELD (model for end stage liver disease) and MDF (Maddrey's discriminant function) score-based ML were used to predict disease severity. MELD-associated LASSO regression found that basophils, MCV, neutrophils, RBC and white blood cells (WBCs) were potential prognostic factors ( Figure 3A-G). The RF model had high performance ( Figure 3F andG and Table 1). In addition, MDF-associated LASSO regression found that eosinophils, platelets, RBCs, PMR and PNR were potential prognostic factors ( Figure 3H-N). The MLP model had high performance ( Figure 3M andN and Table 1). These significant IRCs and the best model of MDF differ from the MELD-based model. One possible explanation is that the evaluation method is different between the two scores.
ML represents the future of clinical medicine and will be implemented in healthcare systems to facilitate early identification of target populations. We used four ML models to predict ALD, which were constructed based on IRCs that have significant roles in liver injury, fibrogenesis and regeneration. 6 Different IRCs play different leading roles in disease progression and prognosis, which means that appropriate cells are needed to assess the different stages of the disease. Neutrophils, the most common components of circulating WBCs, are implicated in the immune pathogenesis of ALD with complex and multifaceted properties, that is, neutrophils directly cause hepatocyte injury and liver inflammation in ALD, while neutrophil-induced cytokines have a protective role in inflammation and liver repair. 7,8 In addition to neutrophils alone, the effects of neutrophil and immune cell interaction (e.g., PNR, MNR and NLR) were also relevant features of the ML models. Platelet interactions with IRCs, especially neutrophils, have been extensively investigated in a variety of diseases 6,9 except liver diseases. Future studies are highly recommended to explore the diversity and complexity of neutrophil functions to formulate targeted interventions and treatments for ALD. In addition, systemic inflammation is often accompanied by changes in RBCs. For example, RBCs increase the risk of liver disease, and RBC count is related to fatty liver index and disease progression. 10 In conclusion, IRCs and their interactions are of great importance in ALD, and IRC-based ML models, especially the RF model, are accurate and inexpensive tools for identifying the progression and prognosis of ALD. This study could be used as a basis for the development of ML models for disease prediction.

A C K N O W L E D G E M E N T S
We thank all the subjects who participated in this study.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflicts interests.

F U N D I N G I N F O R M AT I O N
XZ is supported by the National Natural Science Foundation of China (82100628), the Natural Science Foundation of Anhui Province (2108085QH313), the Postdoctoral Research Foundation of China (2021M700183B496). YS is supported by the National Natural Science Foundation of China (82100627) and the Natural Science Foundation of Anhui Province (2108085QH311).